Expand description
This crate provides approximate quantiles over data streams in a moderate amount of memory.
Order statistics is a rough business. Exact solutions are expensive in terms of memory and computation. Recent literature has advanced approximations but each have fundamental tradeoffs. This crate is intended to be a collection of approximate algorithms that provide guarantees around space consumption.
Modules§
- This is an implementation of the algorithm presented in Cormode, Korn, Muthukrishnan, Srivastava’s paper “Effective Computation of Biased Quantiles over Data Streams”. The ambition here is to approximate quantiles on a stream of data without having a boatload of information kept in memory.
- Greenwald Khanna calculates epsilon-approximate quantiles. If the desired quantile is phi, the epsilon-approximate quantile is any element in the range of elements that rank between
lbound((phi-epsilon) x N)
andlbound((phi+epsilon) x N)
- ‘histogram’ approximates a distribution calculation by counting the number of times samples fall into pre-configured bins. This implementation does not require bins to be equally sized. The user must specify upper bounds on bins via
Bounds
. The implementation includes a +Inf bound automatically. - Misra-Gries calculates an ε-approximate frequency count for a stream of N elements. The output is the k most frequent elements.